Highly Relevant Documents Lost in CLIR: Experiments with Dictionary Translation and Pseudo-Relevance Feedback

نویسندگان

  • Raija Lehtokangas
  • Heikki Keskustalo
چکیده

Research on cross-language information retrieval (CLIR) has typically been restricted to settings using binary relevance assessments. In this paper, we present evaluation results for dictionary-based CLIR using graded relevance assessments in a best match retrieval environment. A text database containing newspaper articles and a related set of 35 search topics were used in the tests. First, monolingual baseline queries were automatically formed from the topics. Secondly, source language topics (in English, German, and Swedish) were automatically translated into the target language (Finnish), using structured target queries. Effectiveness of the translated queries was compared to that of the monolingual queries. Thirdly, pseudo-relevance feedback was used to expand the original target queries. CLIR performance was evaluated using three relevance thresholds: stringent, regular, and liberal. When regular or liberal threshold was used, a reasonable performance was achieved. Adopting stringent threshold caused a considerable loss of performance when compared to monolingual Finnish performance, with or without pseudo-relevance feedback.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Corpus-based CLIR in retrieval of highly relevant documents

IR systems’ ability to retrieve highly relevant documents has become more and more important in the age of extremely large collections, such as the WWW. Our aim was to find out how corpus-based CLIR manages in retrieving highly relevant documents. We created a FinnishSwedish comparable corpus and used it as a source of knowledge for query translation. Finnish test queries were translated into S...

متن کامل

Experiments with transitive dictionary translation and pseudo-relevance feedback using graded relevance assessments

Research on cross-language information retrieval (CLIR) has typically been restricted to settings using binary relevance assessments. In this paper, we present evaluation results for transitive dictionary-based CLIR using graded relevance assessments in a best match retrieval environment. A text database containing newspaper articles and a related set of 35 search topics were used in the tests....

متن کامل

University of Chicago at NTCIR4 CLIR: Multi-Scale Query Expansion

Pseudo-relevance feedback, while useful in monolingual applications for refining and enriching short user queries, proves even more important in crosslanguage information retrieval (CLIR). For CLIR, query expansion before and after translation can provide an opportunity to recover from translation gaps, reduce ambiguity, and enhance recall. Furthermore, for CLIR in unsegmented Asian languages, ...

متن کامل

Two Stages Refinement of Query Translation for Pivot Language Approach to Cross Lingual Information Retrieval: A Trial at CLEF 2003

This paper reports experimental results of cross-lingual information retrieval from German to Italian. The authors are concerned with CLIR in the case that available language resources are very limited. Thus transitive translation of queries using English as a pivot language was used to search Italian document collections for German queries without any direct bilingual dictionary or MT system o...

متن کامل

Dublin City University at CLEF 2007: Cross Language Speech Retrieval (CL-SR) Experiments

The Dublin City University participated in the CLEF 2007 CL-SR English task. For CLEF 2007 we concentrated primarily on the issues of topic translation, combining this with search field combination and pseudo relevance feedback methods used for our CLEF 2006 submissions. Topics were translated into English using the Yahoo! BabelFish free online translation service combined with domain-specific ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006